Identification of Telugu, Devanagari and English Scripts Using Discriminating Features
نویسنده
چکیده
In a multi-script multi-lingual environment, a document may contain text lines in more than one script/language forms. It is necessary to identify different script regions of the document in order to feed the document to the OCRs of individual language. With this context, this paper proposes to develop a model to identify and separate text lines of Telugu, Devanagari and English scripts from a printed trilingual document. The proposed method uses the distinct features extracted from the top and bottom profiles of the printed text lines. Experimentation conducted involved 1500 text lines for learning and 900 text lines for testing. The performance has turned out to be 99.67%.
منابع مشابه
Proposal on Handling Reph in Gurmukhi and Telugu Scripts
Chapter 9 of the Unicode standard [1] describes the representational model for encoding Indic scripts. Devanagari is described in Section 9.1; the principles of Indic scripts are covered in some detail in the introduction to Devanagari. The descriptions of the remaining Indic scripts were abbreviated highlighting any di erences from Devanagari where appropriate. Some of the problems in this des...
متن کاملKannada, Telugu and Devanagari Handwritten Numeral Recognition with Probabilistic Neural Network: A Script Independent Approach
In this paper a script independent automatic numeral recognition system is proposed. A single algorithm is proposed for recognition of Kannada, Telugu and Devanagari handwritten numerals. In general the number of classes for numeral recognition system for a scripts/language is 10. Here, three scripts are considered for numeral recognition forming 30 classes. In the proposed method 30 classes ha...
متن کاملWavelet Packet Based Texture Features for Automatic Script Identification
In a multi script environment, an archive of documents printed in different scripts is in practice. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify the script type of the document. In this paper, a novel texture-based approach is presented to identify the script type of the collection of documents printed in ten Indian scripts ...
متن کاملA survey on optical character recognition for Bangla and Devanagari scripts
Abstract. The past few decades have witnessed an intensive research on optical character recognition (OCR) for Roman, Chinese, and Japanese scripts. A lot of work has been also reported on OCR efforts for various Indian scripts, like Devanagari, Bangla, Oriya, Tamil, Telugu, Malayalam, Kannada, Gurmukhi, Gujarati, etc. In this paper, we present a review of OCR work on Indian scripts, mainly on ...
متن کاملA Survey of Feature Extraction and Classification Techniques Used In Character Recognition for Indian Scripts
The Constitution of India, under its Eight Schedule, has recognized Hindi (in Devanagari Script) and English as Official languages of Union Government, along with other 22 languages as Scheduled languages and given status and official encouragement to these Scheduled Languages. Most of the Optical recognition research work has been done on Devanagari, Telugu, and Bangla scripts etc. D e v e l o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009